Extending and Improving Wordnet via Unsupervised Word Embeddings
نویسندگان
چکیده
This work presents an unsupervised approach for improving WordNet that builds upon recent advances in document and sense representation via distributional semantics. We apply our methods to construct Wordnets in French and Russian, languages which both lack good manual constructions.1 These are evaluated on two new 600-word test sets for word-to-synset matching and found to improve greatly upon synset recall, outperforming the best automated Wordnets in F-score. Our methods require very few linguistic resources, thus being applicable for Wordnet construction in low-resources languages, and may further be applied to sense clustering and other Wordnet improvements.
منابع مشابه
Improving Distributed Representation of Word Sense via WordNet Gloss Composition and Context Clustering
In recent years, there has been an increasing interest in learning a distributed representation of word sense. Traditional context clustering based models usually require careful tuning of model parameters, and typically perform worse on infrequent word senses. This paper presents a novel approach which addresses these limitations by first initializing the word sense embeddings through learning...
متن کاملUsing Word Embeddings for Bilingual Unsupervised WSD
Unsupervised Word Sense Disambiguation (WSD) is one of the challenging problems in natural language processing. Recently, an unsupervised bilingual WSD approach has been proposed. This approach uses context aware EM formulation for estimating the sense distribution by using the co-occurrence counts of cross-linked words in comparable corpora. WordNetbased similarity measures are used for approx...
متن کاملAutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes
We present AutoExtend, a system to learn embeddings for synsets and lexemes. It is flexible in that it can take any word embeddings as input and does not need an additional training corpus. The synset/lexeme embeddings obtained live in the same vector space as the word embeddings. A sparse tensor formalization guarantees efficiency and parallelizability. We use WordNet as a lexical resource, bu...
متن کاملSupervised and Unsupervised Word Sense Disambiguation on Word Embedding Vectors of Unambigous Synonyms
This paper compares two approaches to word sense disambiguation using word embeddings trained on unambiguous synonyms. The first one is an unsupervised method based on computing log probability from sequences of word embedding vectors, taking into account ambiguous word senses and guessing correct sense from context. The second method is supervised. We use a multilayer neural network model to l...
متن کاملAn Iterative Approach for Unsupervised Most Frequent Sense Detection using WordNet and Word Embeddings
Given a word, what is the most frequent sense in which it occurs in a given corpus? Most Frequent Sense (MFS) is a strong baseline for unsupervised word sense disambiguation. If we have large amounts of sense-annotated corpora, MFS can be trivially created. However, senseannotated corpora are a rarity. In this paper, we propose a method which can compute MFS from raw corpora. Our approach itera...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1705.00217 شماره
صفحات -
تاریخ انتشار 2017